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Abstract 

This paper describes two Haskell libraries for property-based test¬ 
ing. Following the lead of QuickCheck (Claessen and Hughes 
2000), these testing libraries SmallCheck and Lazy SmallCheck 
also use type-based generators to obtain test-sets of finite values 
for which properties are checked, and report any counter-examples 
found. But instead of using a sample of randomly generated values 
they test properties for all values up to some limiting depth, pro¬ 
gressively increasing this limit. The paper explains the design and 
implementation of both libraries and evaluates them in comparison 
with each other and with QuickCheck. 

Categories and Subject Descriptors D.1.1 [ Applicative (Func¬ 
tional) Programming ]; D.2.5 [Software Engineering ]: Testing and 
Debugging 

General Terms Languages, Verification 

Keywords Embedded Language, Property-based Testing, Ex¬ 
haustive Search, Lazy Evaluation, Type Classes 

1. Introduction 

In their ICFP’OO paper Claessen and Hughes propose an attractive 
approach to property-based testing of Haskell programs, as imple¬ 
mented in their QuickCheck library. Properties relating the com¬ 
ponent functions of a program are specified in Haskell itself. The 
simplest properties are just Boolean-valued functions, in which the 
body is interpreted as a predicate universally quantified over the ar¬ 
gument variables, and a small library of operators provides for vari¬ 
ations such as properties that are conditionally true. QuickCheck 
exploits Haskell’s type classes to check properties using test-sets 
of randomly generated values for the universally-quantified argu¬ 
ments. If a failing case is discovered, testing stops with a report 
showing the counter-example. 

Specifying properties in QuickCheck forces programmers to 
think hard about what makes their programs correct, and to record 
their conclusions in a precise form. Even this preliminary outcome 
of exact documentation has value. But the big reward for specifying 
properties is that they can be tested automatically, perhaps reveal¬ 
ing bugs. 
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1.1 Motivation 

Although QuickCheck is widely used by Haskell developers, and is 
often very effective, it has drawbacks. The definition of appropriate 
test generators for user-defined types is a necessary prerequisite 
for testing, and it can be tricky to define them so as to obtain a 
suitable distribution of values. However they are defined, if failing 
cases are rare, none may be tested even though some of them are 
very simple; this seems to be an inevitable consequence of using 
randomly selected tests. 

We have therefore developed variations inspired by QuickCheck 
but using a different approach to the generation of test-data. Instead 
of random testing, we test properties for all the finitely many values 
up to some depth, progressively increasing the depth used. For data 
values, depth means depth of construction. For functional values, 
it is a measure combining the depth to which arguments may be 
evaluated and the depth of possible results. 

The principal motivation for this approach can be summarised 
in the following observations, akin to the small scope hypothesis 
behind model-checking tools such as Alloy (Jackson 2006). (1) If 
a program fails to meet its specification in some cases, it almost 
always fails in some simple case. Or in contrapositive form: (2) If 
a program does not fail in any simple case, it hardly ever fails in 
any case. A successful test-run using our tools can give exactly this 
assurance: specified properties do not fail in any simple case. There 
is also a clear demarcation between tested and untested cases. Other 
advantages include a simple standard pattern for generators of user- 
defined types. 

1.2 Contributions 

Our main contributions are: 

1. the design of SmallCheck, a library for property-based testing 
by exhaustive enumeration of small values, including support 
for existential quantification; 

2. the design of Lazy SmallCheck, an alternative which tests prop¬ 
erties for partially-defined values, using the results to prune 
test spaces automatically and parallel conjunction to enable fur¬ 
ther pruning (currently only first-order properties with universal 
quantifiers are supported); 

3. a comparative evaluation of these tools and QuickCheck applied 
to a range of example properties. 

1.3 Road-map 

The rest of this paper is arranged as follows. Section 2 reviews 
QuickCheck. Section 3 describes SmallCheck. Section 4 describes 
Lazy SmallCheck. Section 5 is a comparative evaluation. Section 6 
discusses related work. Section 7 suggests avenues for future work 
and concludes. 




2. QuickCheck: a Review 

2.1 Arbitrary Types and Testable Properties 

QuickCheck defines a class of Arbitrary types for which there 
are random value generators. There are predefined instances of this 
class for most Prelude types. It also defines a class of Testable 
property types for which there is a method mapping properties to 
test computations. The Testable instances include: 
instance Testable Bool 

instance (Arbitrary a. Show a. Testable b) 

=> Testable (a -> b) 

Any Testable property can be tested automatically for some pre¬ 
assigned number of random values using 
quickCheck :: Testable a => a -> 10 () 
a class-polymorphic test-driver. It reports either success in all cases 
tested, or else a counterexample for which the property fails. 

Example 1 Suppose the program being tested includes a function 
isPrefix : : Eq a => [a] -> [a] -> Bool 
that checks whether its first argument is a prefix of its second. One 
expected property of isPrefix can be specified as follows. 
prop_isPrefix :: [Int] -> [Int] -> Bool 
prop_isPrefix is is 1 = isPrefix xs (xs++xs’) 

The argument variables xs and xs ’ are understood to be univer¬ 
sally quantified : the result of prop_isPref ix should be True for 
all (finite, fully defined) xs and xs’. As prop_isPrefix has a 
Testable type — its explicitly declared monomorphic type en¬ 
ables appropriate instances to be determined — it can now be 
tested. 

Main> quickCheck prop_isPrefix 
OK, passed 100 tests. 

Alternatively, if isPrefix actually interprets its arguments the 
other way round, the output from quickCheck might be 
Falsifiable, after 1 tests: 

[1] 

[2] 

as the property then fails for xs= [1], xs ’ = [2]. tlj§> 

2.2 Generators for User-defined Types 

For properties over user-defined types, appropriate Arbitrary in¬ 
stances must be written to generate random values of these types. 
QuickCheck provides various functions that are useful in this task. 

Example 2 Consider the following data-type for logical proposi¬ 
tions. To shorten the example, we restrict connectives to negation 
and disjunction. 

data Prop = Var Name I Not Prop I Or Prop Prop 

Assuming that an Arbitrary Name instance is defined elsewhere, 
here’s how a QuickCheck user might define an Arbitrary Prop 
instance. 

instance Arbitrary Prop where 
arbitrary = sized arbProp 

where arbProp 0 = liftM Var arbitrary 
arbProp n = frequency 

[ (1,liftM Var arbitrary) 

, (2,liftM Not (arbProp (n-1))) 

, (4,liftM2 Or (arbProp (n 'div‘ 2)) 

(arbProp (n ‘div‘ 2))) ] 


The sized function applies its argument to a random integer. The 
frequency function also abstracts over a random source, choosing 
one of several weighted alternatives: in the example, the probability 
of a Var construction is 1 /7. □ 

As this example shows, defining generators for recursive types 
requires careful use of controlling numeric parameters. 

2.3 Conditional Properties 

Often the body of a property takes the form of an implication, as it 
is only expected to hold under some condition. If implication were 
defined simply as a Boolean operator, then cases where the con¬ 
dition evaluates to False would count as successful tests. Instead 
QuickCheck defines an implication operator ==> with the signature 
(==>) :: Testable a => Bool -> a -> Property 
where Property is a new Testable type. Test cases where the 
condition fails do not count. 

Example 3 Suppose that an abstract data type for sets is to be 
implemented. One possible representation is an ordered list. Of 
course, sets are unordered collections, but an ordered list permits 
the uniqueness of the elements to be preserved more efficiently by 
the various set operations. 

type Set a = [a] 

Each set operation may assume that the lists representing the 
input sets are ordered, and must ensure that the same is true of any 
output sets. For example, the operation to insert an element into a 
set, of type 

insert :: Ord a => a -> Set a -> Set a 
should preserve the familiar ordered predicate on lists. 

prop_insertSet :: Char -> Set Char -> Property 
prop_insertSet c s = 

ordered s ==> ordered (insert c s) 

If we apply quickCheck to prop_insertSet, few of the cases 
generated satisfy the condition, but a larger test-set is used to 
compensate. ' 

This example illustrates a difficulty with conditional properties 
that often arises in practice: what if a condition is rarely satisfied 
by randomly generated values of the appropriate type? QuickCheck 
has to limit the total number of cases generated, whether or not they 
satisfy the condition, so few if any valid tests are performed. The 
recommended solution is to define custom generators , designed to 
give only values that satisfy the desired condition. There are three 
main drawbacks of this: (1) writing good custom generators can 
be hard; (2) the property that all and only required values can 
be generated may be hard to verify; (3) properties showing that 
some invariant condition is preserved (a common pattern) must 
express the pre-condition in a generator, but the post-condition in a 
predicate. 

2.4 Higher-order Properties 

Higher-order functions are important components in many Haskell 
programs, and they too have properties that should be tested. One 
of the nice surprises in QuickCheck is that even functional values 
can be generated at random. The details of how this is done are 
quite subtle, but the key is an auxiliary method coarbitrary that 
transforms a generator for the result type in a way that depends on 
a given value of the argument type. 

Functional test values do have the disadvantage that when a test 
fails QuickCheck is not able to show the functional values involved 
in the counter-example. They are displayed only by a «fun» 
place-holder. 



2.5 Test Coverage 

In the opinion of its authors “The major limitation of QuickCheck 
is that there is no measurement of test coverage.” (Claessen and 
Hughes 2000). Users who want to assess coverage of the input 
domain can define functions that compute attributes of test data; 
QuickCheck provides helper functions to report the distribution of 
the attribute values for each randomly generated test-set. But this 
arrangement is quite ad hoc, and it requires extra work. A few 
years on, a coverage tool such as Hpc (Gill and Runciman 2007) 
can provide fine-grained source-coverage information not only for 
the program under test, but also for test-case generators and the 
tested properties. Yet even with 100% coverage of all these sources, 
simple failing cases may never be tested. 


Small Functions 

Functions generated as test cases should give totally defined re¬ 
sults (given totally defined arguments) so that they do not cause 
undefined test computations. There is a natural link between this 
requirement and depth-bounded recursion which allows any func¬ 
tion of a data-type argument to be represented non-recursively by 
formulating its body as nested case expressions. The depth of a 
function represented in this way is defined as the maximum, for 
any argument-result pair, of the depth of nested case analysis of 
the argument plus the depth of the result 1 . This rule is consistent 
with the principle of appealing to an algebraic-term representation: 
we are treating each case like a constructor with the bodies of its 
alternatives as components. 


2.6 Counter Examples 

A small counter-example is in general easier to analyse than a large 
one. QuickCheck, although beginning each series of tests with a 
small size parameter and gradually increasing it, is in many cases 
unlikely to find a simplest counter-example. To compensate for 
this, QuickCheck users may write type-specific shrinking func¬ 
tions. However, writing shrinking functions requires extra work 
and the mechanism still does not guarantee that a reported counter¬ 
example is minimal. 


Example 5 The Bool -> Bool functions of depth zero are: 
\b -> True 
\b -> False 


And those of depth < 
\b -> case b of 
\b -> case b of 
\b -> case b of 
\b -> case b of 


{True -> True ; 
{True -> True ; 
{True -> False ; 
{True -> False ; 


False -> True > 
False -> False} 
False -> True } 
False -> False} 


3. SmallCheck 

3.1 Small Values 

SmallCheck re-uses many of the property-based testing ideas in 
QuickCheck. It too tests whether properties hold for finite total val¬ 
ues, using type-driven generators of test cases, and reports counter¬ 
examples. But instead of generating test cases at random, it enu¬ 
merates all small test cases exhaustively. Almost all other changes 
follow as a result of this one. The principle SmallCheck uses to 
define small values is to bound their depth by some small natural 
number. 

Small Data Structures 

Depth is most easily defined for the values of algebraic data types. 
As usual for algebraic terms, the depth of a zero-arity construction 
is zero, and the depth of a positive-arity construction is one greater 
than the maximum depth of a component argument. 

Example 2 (revisited) Recalling the data-type Prop of logical 
propositions, suppose the Name type is defined by: 


As True and False have no sub-components for deeper analysis, 
there are no Bool -> Bool functions of depth two or more. O 

3.2 Serial Types 

Instead of a class Arbitrary of types with a random value gener¬ 
ator, SmallCheck defines a class Serial of types that can be enu¬ 
merated up to a given depth. 

Serial Data 

For all the Prelude data types. Serial instances are predefined. 
Writing a new Serial instance for an algebraic datatype is very 
straightforward. It can be concisely expressed using a family of 
combinators cons<N>, generic across any combination of Serial 
component types, where <N> is constructor arity. 

Example 2 (revisited) The Prop datatype has constructors Var 
and Not of arity one, and Or of arity two. A Serial instance for it 
can be defined by 
instance Serial Prop where 

series = consl Var \/ consl Not \/ cons2 Or 


data Name = P I Q I R 

Then all Name values have depth 0, and the Prop value-construction 
Or (Not (Var P)) (Var Q) has depth 3. 0 


assuming a similar Serial instance for the Name type. O 

A series is just a function from depth to finite lists 
type Series a = Int -> [a] 


Small Tuples 

The rule for tuples is a little different. The depth of the zero- 
arity tuple is zero, but the depth of a positive-arity tuple is just 
the maximum component depth. Values are still bounded as tuples 
cannot have recursive components of the same type. 

Small Numeric Values 

For primitive numeric types the definition of depth is with reference 
to an imaginary representation as a data structure. So the depth 
of an integer i is its absolute value, as if it was constructed al¬ 
gebraically as Succ 1 Zero. The depth of a floating point number 
s x 2® is the depth of the integer pair (s, e). 

Example 4 The small floating point numbers, of depth no more 
than 2, are-4.0, -2.0, -1.0, -0.5, -0.25, 0.0, 0.25, 0.5, 1.0, 
2.0 and 4.0. 


and sum and product over two series are defined by 
(\/) :: Series a -> Series a -> Series a 
si \/ s2 = \d -> si d ++ s2 d 

(><) :: Series a -> Series b -> Series (a, b) 
si >< s2 = \d -> [(x,y) | x <- si d, y <- s2 d] 

The cons<N> family of combinators is defined in terms of X, 
decreasing and checking the depth appropriately. For instance: 

consO c = \d -> [c] 

consl c = \d -> [c a I d > 0, a <- series (d—1)] 
cons2 c = \d -> [cab I d > 0, 

(a,b) <- (series >< series) (d—1)] 


1 The current implementation by default generates strict functions, and 
counts only nested case occurrences when determining depth. 





Serial Functions 

To generate functions of type a->r requires, in addition to a 
Serial instance for result type r, an auxiliary method coseries 
for argument type a, analogous to QuickCheck’s coarbitrary. 
Again predefined combinators support a standard pattern of defini¬ 
tion: this time the alts<N> family to generate case alternatives. 
Example 2 (Revisited) Here is a coseries definition for the 
Prop datatype of propositions, using the standard pattern, 
coseries rs d = [ \p —> case p of 

Var n -> var n 

Not pi -> not pi 

Or pi p2 -> or pi p2 
I var <- altsl rs d , 
not <- altsl rs d , 
or <- alts2 rs d ] 

Explicit fresh variable names are needed (1) in the case alter¬ 
natives of the lambda body, (2) in the function-list generators, and 
(3) to pass on the result series and the bounding depth. So the pat¬ 
tern of definition here, though still straightforward, is more verbose 
than for series. □ 

The first few members of the alts<N> family are defined by: 
altsO as d = as d 
altsl bs d = if d > 0 

then coseries bs (d-1) 
else [\_ -> x I x <- bs d] 
alts2 cs d = if d > 0 

then coseries (coseries cs) (d-1) 

else [\_-> x I x <- cs d] 

For programs with many or large datatype definitions, mechan¬ 
ical derivation of Serial instances is preferable. The standard pat¬ 
terns are sufficiently regular that they can be inferred by the Derive 
tool (Mitchell and O’Rear 2007), for example. 

3.3 Testing 

Just as QuickCheck has a top-level function quickCheck so Small- 
Check has smallCheck d. 

smallCheck :: Testable a => Int -> a -> 10 0 

It runs series of tests using depth bounds 0. . d, stopping if any test 

fails, and prints a summary report. An interactive variant 

smallCheckl :: Testable a => a -> 10 () 

invites the user to decide after each completed round of tests, and 

after any failure, whether to continue. 

Example 6 Consider testing the (ill-conceived) property that all 
Boolean operations are associative. 
prop_assoc op = \x y z -> 

(x ‘op‘ y) ‘op‘ z == x ‘op‘ (y ‘op‘ z) 
where typelnfo = op :: Bool -> Bool -> Bool 
Testing soon uncovers a failing case: 

Main> smallCheckl prop_assoc 
Depth 0: 

Failed test no. 22. Test values follow. 
{True->{True->True; False->True}; 

False->{True->False; False->True» 

False 

True 

False 

Being able to generate a series of all (depth-bounded) values of 
an argument type, SmallCheck can give at least partial information 
about the extension of a function. ' 


3.4 Properties & the Pragmatics of Implication 

The language of testable properties in SmallCheck is deliberately 
very close to that in QuickCheck. (It omits operators for gather¬ 
ing statistics about attributes as their main use in QuickCheck is to 
obtain information about the actual distribution of randomly gen¬ 
erated tests.) As in QuickCheck, the ==> operator can be used to 
express a restricting condition under which a property is expected 
to hold. Again separate counts are maintained of tests that satisfy 
the condition and tests that do not, but the operational semantics 
of ==> are different. Regardless of the counts, the full (finite) set 
of tests is applied exhaustively, unless a failing counter-example 
brings testing to a halt. The following example illustrates an impor¬ 
tant pragmatic rule for property writers. 

Example 2 (revisited) Recall again the type Prop of logical 
propositions. Suppose there are functions 
eval :: Prop -> Env -> Bool 
tautology :: Prop -> Bool 

where eval evaluates the truth of a proposition in a given envi¬ 
ronment, and tautology is some procedure to decide whether a 
proposition is true in every environment. We expect the following 
property to hold. 

prop_tautEval :: Prop -> Env -> Property 
prop_tautEval p e = tautology p ==> eval p e 

However, if the property is so-defined, SmallCheck may test all 
possible combinations of values for p and e. We are using an 
embedded property language and there is no reflective facility by 
which SmallCheck can itself discover that the condition depends 
only on p. The following alternative formulation avoids the prob- 

prop_tautEval’ p = tautology p ==> \e -> eval p e 

Now SmallCheck only tests cases involving the minority of proper¬ 
ties p for which tautology p holds. Although the difference be¬ 
tween the two formulations is of no consequence in QuickCheck, 
as for each test it generates a single random pair of values p and e, 
in SmallCheck the second is clearly to be preferred. 

3.5 Existential Properties 

SmallCheck extends the property language to permit existential 
quantifiers. Testing a random sample of values as in QuickCheck 
would rarely give useful information about an existential property: 
often there is a unique witness and it is most unlikely to be selected 
at random. But SmallCheck can exhaustively search for a small 
witness. There are several existential variants, but the basic one has 
the following signature. 

exists :: (Show a, Serial a, Testable b) => 

(a -> b) -> Property 

The interpretation of exists f is that for some argument x testing 
the result f x succeeds. To illustrate the application of existentials, 
and some issues with their use, we begin with a reappraisal of 
previous examples. 

Example 1 (revisited) The only property of isPref ix specified 
sofaris: 

prop_isPrefix xs is 1 = isPrefix xs (xs++xs’) 

This property is necessary but not sufficient for a correct isPref ix. 

For example, it holds for the erroneous definition 

isPref ix [] ys = True 

isPrefix (x:xs) [] = False 

isPrefix (x:xs) (y:ys) = x==y I I isPrefix xs ys 



or even for an isPref ix that always returns True! In terms of the 
following full specification for isPref ix 

VxsVys(isPref ix xs ys 3xs ’ (xs++xs ’ = ys)) 
the partial specification prop_isPref ix captures only the di¬ 
rection — re-expressing the existential implicitly by the introduc¬ 
tion of xs ’ rather than ys as the second variable in the property. 
Viewing isPref ix as a decision procedure, prop_isPref ix as¬ 
sures its completeness but ignores its soundness. 

Using SmallCheck, we can test for soundness too. The ==> 
direction of the specification can be expressed like this: 
prop_isPrefixSound xs ys = 
isPrefix xs ys ==> 

exists $ \xs’ -> xs++xs’ == ys 
Testing prop_isPref ixSound for the erroneous definition of 
isPrefix gives: 

Main> smallCheckl prop_isPrefixSound 
Depth 2: 

Failed test no. 11. Test values follow. 

[-1] 

[ 0 ] 

non-existence 

Continue? 

The nearest a QuickCheck user can get to the soundness prop¬ 
erty is a constructive variant introducing a Skolem function, e.g. 
prop_isPrefixSound’ xs ys = 

isPrefix xs ys ==> xs ++ skolem xs ys == ys 
where skolem = drop . length 
A Skolemised formulation of this kind demands extra information 
compared to the existential original, making the property harder 
to read. A more significant drawback is that a suitable Skolem 
function has to be invented and correctly defined. In this example 
it is both simple and unique, but that is often not so. lifSS 

Example 2 (revisited) For a decision procedure such as 
satisfiable :: Prop -> Bool 
we can similarly define a soundness property. 
prop_satSound p = 

satisfiable p ==> exists $ \e -> eval p e 

But this time there is no unique Skolem function (p may be true 
in many different environments e), nor is there a simple choice of 
such a function that can be defined as a one-liner. □ 

Unique Existentials 

For some existential properties it is important that there is a unique 
witness. A formulation based on the equivalence 

3!x(P x) 3x(P X A Vy(P y => y = x)) 
would be cumbersome to write, inefficient to test and cannot be 
used for types outside the Eq class, such as functions. So Small- 
Check defines the variant existsl. When unique existential prop¬ 
erties are tested, any failure reports conclude with “non-existence” 
or “non-uniqueness followed by two witnesses. 

Depth of Existential Searches 

The default testing of existentials is bounded by the same limiting 
depth as for universals. This rule has important consequences. A 
universal property may be satisfied when the depth-bound on test 
values is shallow but fail when it is deeper. Dually, an existential 
property may only succeed if the depth-bound on test-values is 


large enough. So when testing properties involving existentials it 
can make sense to continue with deeper testing after a shallow 
failure. 

Sometimes the default same-depth-bound interpretation of ex¬ 
istential properties can make testing of a valid property fail at all 
depths. SmallCheck provides customising existential quantifiers for 
use in such circumstances. They take as an additional argument an 
Int->Int function that transforms the depth-bound for testing. 

Example 7 The property 

prop_apex :: [Bool] -> [Bool] -> Property 
prop_apex xs ys = exists $ \zs -> zs == xs++ys 
inevitably fails at all depths greater than zero, but the variant 
prop_apex’ xs ys = 

existsDeeperBy (*2) $ \zs -> zs == xs++ys 
succeeds at all depths. 0 

3.6 Dealing with Large Test Spaces 

Using the standard generic scheme to define series of test values, it 
often turns out that at some small depth d the 10,000-100,000 tests 
are quickly checked, but at depth d+1 it is infeasible to complete 
the billions of tests. This combinatorial explosion is an inevitable 
consequence of relentlessly increasing a uniform depth-limit for 
exhaustive testing. We need ways to reduce some dimensions of 
the search space so that in other dimensions it can be tested more 
deeply. 

Small Base Types 

Although numbers may seem an obvious choice for basic test val¬ 
ues, the test-spaces for compound types (and particularly functional 
types) with numeric bases grow very rapidly as depth increases. For 
many properties, Bool or even 0 is a perfectly sensible choice of 
type for some variables, greatly reducing the test-space to be cov- 

Depth-Adjustment and Filtering 

As in QuickCheck customisation can be achieved by using spe¬ 
cialised test-data generators to redefine the scope of some or all 
quantified properties. Instead of defining completely fresh genera¬ 
tors by ad hoc means, there are two natural techniques for adapting 
the standard machinery. A series generator for type t is just a 
function of type Int -> [t]. It can be composed to the left of a 
depth adjustment function of type Int -> Int, or to the right of 
a filtering function of type [t] -> [t], or both. So although each 
constructor-layer in an algebraic data value normally adds one to 
the depth, this default is easily over-ridden: we can assign any pre¬ 
ferred non-negative integer depth to a constructor by composing 
cons<N> and alts<N> applications in Serial methods with ap¬ 
plications of a depth function. And if in some context it is appro¬ 
priate to restrict values to a subseries, a tool-box of list-trimming 
functions is readily available. 

Example 2 (revisited) Both techniques can be illustrated using 
the Prop data-type. Here is a series generator adapted so that 
Or constructors have a depth cost of two, and propositions are 
restricted to two variables only, 
instance Serial Prop where 
series = take 2 . consl Var 

\/ consl Not 

\/ cons2 Or . depth 2 

By adapting a series definition in this way, rather than arbitrarily 

reprogramming it, we can still see easily which Props are included 
in the test set. Table 1 shows the effect on the number of tests. □ 



Depth 


Adjustments made 


d 

Number of tests at depth 


None 

2-Var 

Or-2 

Both 


1..5 

6..10 

11..15 

16..20 

1 

3 

2 

3 

2 

3 

13 

1 

0 

0 
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15 
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6 

4 
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25 

38 

0 

0 
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243 
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18 

10 
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30 
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82 
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5552 

57 

28 

6 

30 

511 

1132 

452 

5 

— 

30830258 

384 

130 






6 


— 

3636 

916 

Table 2. The distribution of standard depths for ordered ! 

7 



151095 

17818 

erated using a bijection from fists of naturals 

at depth d. 


Table 1. The numbers of Prop test cases for depth limits from 1 to 
7: the standard default, with at most 2 variables, with Or adjusted 
to depth 2, and with both adjustments. 

Constrained Newtypes and Bijective Representations 

Frequently, there are restrictions such as data invariants on the 
appropriate domain of test-cases. As we have seen, such restrictions 
can often be expressed in antecedent conditions of properties, but 
that solution does not prevent the generation of useless tests. A 
natural alternative in a type-driven framework, and one familiar to 
QuickCheck users, is to define a distinct newtype, of tagged values 
satisfying a restriction, and a custom generator for those values. 
Example 8 Perhaps the most frequently occurring simple exam¬ 
ple is that integers must often be limited to the natural numbers. 
Rather than defining a conditional property such as 
prop_takeLength :: Int -> [()] -> Property 
prop_takeLength i xs = 

i >= 0 ==> length (take i xs) == min i (length xs) 
assuming a suitable Serial instance for a Nat type 
newtype Integral a => N a = N a 
type Nat = N Int 
we can instead define: 

prop_takeLength’ :: Nat -> [()] -> Property 
prop_takeLength’ (N n) xs = 
length (take n xs) == min n (length xs) 

Note the appropriate use of unit list elements. G 

Section 2.3 pointed out drawbacks of custom generators, such 
as the difficulty of ensuring correspondence with a restricting pred¬ 
icate. In SmallCheck there is also the requirement for a consistent 
view of depth. One way to address these issues is to find an auxil¬ 
iary representation type from which there is a bijective mapping to 
the restricted values. Values of this representation type are gener¬ 
ated in the standard way, and used to measure depth. The required 
correspondence can be expressed as testable properties. 

Example 8 (revisited) The Nat example is so simple that it hardly 
needs this technique, but let it serve as a simple first illustration, 
using a bijection between the natural numbers and unit lists, 
instance Integral a => Serial (N a) where 
series = map (N . genericLength) 

. (series :: Series [()]) 

We verify that generated Nats uniquely represent all and only the 
non-negative Ints by testing the properties 
prop_natSound :: Nat -> Bool 
prop_natSound (N i) = i >= 0 

prop_natCompl :: Int -> Property 
prop_natCompl i = 

i >= 0 ==> existsl $ \(N n) -> i == n 

/If! 


Example 3 (revisited) As a more typical example, consider a 
newtype for ordered lists of naturals. 

newtype OrdNats = OrdNats [Nat] 

We can exploit a bijection between ordered and unordered lists: 
instance Serial OrdNats where 

series = map (OrdNats . scanll plus) . series 
where plus (N a) (N b) = N (a+b) 

Given that scanll plus really is a bijection, the number of lists 
generated at each depth must, of course, be exactly the same as for 
the default series method. But now every one of them is an ordered 
list. In place of un-ordered lists, most test cases are now ordered 
lists that only occur at much greater depth in the default series, 
beyond feasible reach using brute enumeration. See Table 2. □ 

For some kinds of properties, testing by brute-enumeration of 
values using SmallCheck is only feasible with a shallow depth- 
limit. We have illustrated some customisation techniques that can 
enable deeper testing, but for a more powerful remedy across the 
general class of data-driven tests we now turn to Lazy SmallCheck. 

4. Lazy SmallCheck 

A consequence of lazy evaluation in Haskell is that functions can 
return results when applied to partially-defined inputs. To illustrate, 
consider the following Haskell function ordered. 

ordered [] = True 
ordered [x] = True 

ordered (x:y:zs) = x <= y && ordered (y:zs) 

When applied to 1:0: _L, where _L is is a call to Haskell’s error 
function, ordered returns False. Indeed, ordered (l:0:a;s) 
is False for every xs. Thus, by applying a function to a single 
partially-defined input, one can observe its result over many fully- 
defined ones. 

This ability to see the result of a function on many inputs in 
one go is very attractive to property-based testing: if a property 
holds for a partially-defined input then it will also hold for all fully- 
defined refinements of that input. The aim of Lazy SmallCheck is 
to avoid generating such fruitless refinements. 

Lazy SmallCheck is a compatible subset of SmallCheck, cur¬ 
rently only capable of checking first-order properties with universal 
quantifiers. It requires no extensions to standard Haskell other than 
the ability to detect evaluation of error, and this facility is already 
supported by the main Haskell implementations through imprecise 
exceptions (Peyton Jones et al. 1999). 

4.1 Implication 

In SmallCheck and QuickCheck the ==> operator returns a value 
of type Property, allowing tests falsifying the antecedent to be 
observed. This facility is less useful in Lazy SmallCheck which 
tends not to generate many tests falsifying the antecedent, so ==> 
simply has the type Bool -> Bool -> Bool. 




Example 3 (revisited) The property prop_insertSet states that 
insert should preserve the ordered invariant defined above. 

prop_insertSet c s = 

ordered s ==> ordered (insert (c :: Char) s) 

Using SmallCheck, the property can be tested for all inputs up 
to a given depth using the depthCheck function. 

Main> depthCheck 7 prop_insertSet 
Depth 7: 

Completed 109600 test(s) without failure. 

But 108576 did not meet ==> condition. 

Passing the property to Lazy SmallCheck’s depthCheck func¬ 
tion instead yields 

Main> depthCheck 7 prop_insertSet 
OK, required 1716 tests at depth 7 


Both testing libraries use the same definition of depth, so the 
input-space checked by each is identical. The difference is that 
by generating partially-defined inputs. Lazy SmallCheck is able 
to perform the check with fewer tests. To see why, observe that 
prop_insertSet applied to the partially-defined inputs _L and 
’b’ : ’a’ :_L is True. Replacing each _L with more-defined values 
is unnecessary because prop_insertSet will not look them. 

Example 3 will be used throughout the next three sections to 
illustrate further points about Lazy SmallCheck. 

4.2 Laziness is Delicate 

The set invariant in the example can be strengthened. Not only 
should the list representing a set be ordered, but it should also 
contain no duplicates, as expressed by the function allDiff. 

allDiff [] = True 

allDiff (x:xs) = x ‘notElenT xs kk allDiff xs 

The stronger invariant is 

isSet s = ordered s kk allDiff s 

and prop_insertSet can be modified to use it. 

prop_insertSet c s = isSet s ==> isSet (insert c s) 

The isSet invariant reduces the number of tests generated by 
Lazy SmallCheck. 

Main> depthCheck 7 prop_insertSet 
OK, required 964 tests at depth 7 

This is because some lists satisfy ordered but not allDiff, so 
there is increased scope for falsifying the condition without de¬ 
manding the value of element being inserted. 

However, now suppose that the conjuncts of isSet are reversed. 

isSet s = allDiff s kk ordered s 

Checking prop_insertSet now requires some twenty times more 
tests than the version with the original conjunct ordering. 

Main> depthCheck 7 prop_insertSet 
OK, required 20408 tests at depth 7 

The problem is that kk evaluates its left-hand argument first, and 
allDiff is less restrictive than ordered in this case. 


4.3 Parallel Conjunction 

When a property is composed of several sub constraints, like 
isSet, putting the most restrictive one first helps Lazy Small¬ 
Check reduce the number of tests. But it is not always clear what 
the order should be. In fact, the best order may differ depending on 
the depth at which the property is checked. 

Lazy SmallCheck provides the user with an alternative to nor¬ 
mal conjunction called parallel conjunction and represented by 
*&*. A parallel conjunction is falsified if any of its conjuncts are. 
This is in contrast to a standard conjunction which returns _L if its 
first argument is _L, even if its second is falsified. Replacing kk with 
*k* can reduce the need to place the conjuncts in a particular order, 
and can decrease the number of required tests. 

The function *k* is defined in a datatype called Property, 
extending Bool to allow the distinction between sequential and 
parallel conjunction. Boolean values must be explicitly lifted to 
properties. After switching to *k* the example property becomes 

isSet :: Ord a => Set a -> Property 

isSet s = lift (ordered s) *&* lift (allDiff s) 

prop_insertSet :: Char -> Set Char -> Property 
prop_insertSet c s = 

isSet s *=>* isSet (insert c s) 

(Property implication in Lazy SmallCheck is denoted *=>*.) 

The parallel variant of isSet reduces the number of tests com¬ 
pared to either of the non-parallel ones. 

Main> depthCheck 7 prop_insertSet 
OK, required 653 tests at depth 7 

This is because some lists falsify ordered but not allDiff, e.g. 
1:0: _L, and vice-versa, some falsify allDiff but not ordered, 
e.g.0:CHi4' 

Now suppose again that the conjuncts are reversed. 
isSet s = lift (allDiff s) *&* lift (ordered s) 

This time the number of tests does not change, highlighting that 
parallel conjunction is not as sensitive to the order of the conjuncts. 

Main> depthCheck 7 prop_insertSet 
OK, required 653 tests at depth 7 

Despite the advantages of parallel conjunction, it must be intro¬ 
duced manually, with care. An automatic rewrite is not possible, 
since switching to *&* may expose intended partiality in the sec¬ 
ond conjunct. The first conjunct of kk can be used as a guard which 
assures that the input has a certain property before evaluating the 
second one. With *&* such guards disappear and the property may 
crash unfairly. 

Having to lift Booleans to properties does introduce an unfor¬ 
tunate notational burden. Overloaded Booleans (Augustsson 2007) 
would be really helpful here. 

4.4 Strict Properties 

Not all properties are as lazy as prop_insertSet. To illustrate, 
consider the following function that turns a list into a set, throwing 
away duplicates. 

set :: Ord a => [a] -> Set a 
set = foldr insert [] 

We might like to verify that set always returns valid sets. 

prop_set :: [Char] -> Bool 
prop_set cs = isSet (set cs) 




To return True, prop_set demands the entire input, so there is 
no scope for the property to be satisfied by a partially-defined input. 
Checking with SmallCheck yields 

Main> depthCheck 6 prop_set 
Depth 6: 

Completed 1957 test(s) without failure, 
and with Lazy SmallCheck: 

Main> depthCheck 6 prop_set 

OK, required 2378 tests at depth 6 

Not only is Lazy SmallCheck of no benefit in this case, but it 
is worse than SmallCheck because it fruitlessly generates some 
partially-defined inputs as well as all the totally-defined ones. 

4.5 Serial Types 

Lazy SmallCheck also provides a Serial class with a series 
method. But now a series has the type 

type Series a = Int -> Cons a 

From the users perspective. Cons a is an abstract type storing 
instructions on how to construct values of type a. It has the follow¬ 
ing operations. 

empty :: Series a 

(\/) :: Series a -> Series a -> Series a 

(><) :: Series (a -> b) -> Series a -> Series b 

Unlike in SmallCheck, the >< operator represents application 
rather than cross product. To illustrate, SmallCheck’s cons<N> 
family of operators is defined in Lazy SmallCheck in the following 
fashion. 

consO f = cons f 

consl f = cons f >< series 

cons2 f = cons f >< series >< series 

So SmallCheck Serial instances defined using the standard pat¬ 
tern are written identically in Lazy SmallCheck. 

Depth Customisation 

The left-associative >< combinator implicitly takes a depth d, and 
passes d to its left argument and d-1 to its right argument. The 
result is that each child of a constructor is given depth d-1, like in 
SmallCheck. If the depth argument to >< is zero, then no values can 
be constructed. 

Example 9 Suppose a generator for rose trees (Bird 1998) is to 
be written. 

data Rose a = Node a [Rose a] 

The standard list generator might be deemed inappropriate to gen¬ 
erate the children of a node, because each child would be generated 
to a different depth. Instead, the programmer might write 

instance Serial a => Serial (Rose a) where 
series = cons Node >< series >< children 

where children generates a list of values, each of which is 
bounded by the same depth parameter. 

children d = list d 

list = cons [] 

\/ cons (:) >< const (series (d-1)) >< list 

r’jr! 


Primitive Types 

Like in SmallCheck, a series can be defined as a finite list of 
finite fully-defined candidate values. This is achieved using the 
drawnFrom combinator. 

drawnFrom :: [a] -> Cons a 

drawnFrom xs = foldr (\/) empty (map cons xs) 0 

The depth parameter 0 is irrelevant in the above definition, as it is 
not inspected by any of the combinators used. 

Example 10 Here is the Serial instance for Int. 

instance Serial Int where 
series d = drawnFrom [-d..d] 

Using drawnFrom, primitive values of type Integer, Char, Float 
and Double are generated just as they are in SmallCheck. ±j3/_ 

4.6 Implementation 

This section presents the Lazy SmallCheck implementation. Only 
code for parallel conjunction, the Testable class, and for display¬ 
ing counter-examples and counting tests is omitted. 

Partially-defined Inputs 

The central idea of Lazy SmallCheck is to generate partially- 
defined inputs, that is, inputs containing some calls to error. An 
example of a partially-defined input of type Prop is 

Or (Or (Var Q) (Not (error I_"))) (error "_|_") 

Using imprecise exceptions (Peyton Jones et al. 1999), one can 
apply a property to the above term and observe whether it evaluates 
to True, False, or error IHowever, since the input con¬ 
tains several calls to error I it cannot be determined which 
one was demanded by the program. This is the motivation for tag¬ 
ging each error with its position in the tree-shaped term. A posi¬ 
tion is a list of integers, uniquely describing the path from the root 
of the term to a particular sub-term. 

type Pos = [Int] 

For example, the position [1,0] refers to the 0 th child of 
the root constructor’s 1 st child. Lazy SmallCheck encodes such 
positions in the string passed to error. Using the helper function 

hole :: Pos -> a 

hole p = error (sentinel : map toEnum p) 

the above example term of type Prop is now represented as follows. 

Or (Or (Var Q) (Not (hole [0,1,0]))) (hole [1]) 

Each argument to error is prefixed with a sentinel character, 
allowing holes to be distinguished from possible calls to error 
occurring in the property. 

sentinel :: Char 
sentinel = ’ \0’ 

Answers 

The data type Answer is used to represent the result of a property 
applied to a partially-defined input. 

data Answer = Known Bool I Unknown Pos 

Using imprecise exceptions, the following function turns a Bool 
into an Answer. 



answer :: Bool -> 10 Answer 

do res <- try (evaluate a) 
case res of 

Right b -> return (Known b) 

Left (ErrorCall (c:cs)) I c==sentinel -> 
return (Unknown (map fromEnum cs)) 

Left e -> throw e 

The functions try, evaluate, and throw are all exported by 
Haskell’s Control.Exception library: evaluate forces evalua¬ 
tion of the Boolean value passed to it, before returning it in an 10 
action, and try runs the given 10 action, and returns a Right con¬ 
structor containing the action’s result if no exception was raised, 
otherwise it returns a Left constructor containing the exception. 
If the exception represents a hole, then the position of demand is 
extracted and returned. Otherwise the exception is re-thrown. 

When a property applied to a term yields Unknown pos, Lazy 
SmallCheck refines the term by defining it at position pos. 

Refinement 

Looking under the hood, the Cons data type is a little more com¬ 
plicated than the simple list it replaces in SmallCheck. Lazy Small- 
Check must not only generate inputs but also take an existing input 
and refine it at a particular position. 

data Cons a = Type :*: [[Term] -> a] 

This data type can be read as follows: to construct a value of type 
a, one must have a sum-of-products representation of the type, 

data Type = SumOfProd [[Type]] 

and a list of conversion functions (one for each constructor) from 
a list of universal terms (representing the arguments to the con¬ 
structor) to an actual value of type a. A universal term is either 
a constructor with an identifier and a list of arguments, or a hole 
representing an undefined part of the input. 

data Term = Ctr Int [Term] I Hole Pos Type 

Working with universal terms, the refinement operation can be 
defined generically, once and for all: it walks down a term following 
the route specified by the position of demand, 

refine :: Term -> Pos -> [Term] 
refine (Ctr c xs) (i:is) = 

map (Ctr c) [Is ++ y:rs I y <- refine x is] 
where (Is, x:rs) = splitAt i xs 
refine (Hole p (SumOfProd sop)) [] = new p sop 

and when it reaches the desired position, a list of constructors of 
the right type is inserted, each of which is applied to undefined 
arguments. 

new :: Pos -> [[Type]] -> [Term] 

[ Ctr c (zipWith (\i -> Hole (p++[i])) [0..] ts) 

I (c, ts) <- zip [0..] sop ] 

Series Combinators 

The Series combinators cons, empty, \/ and >< are defined in 
Figure 1, along with two auxiliary functions. The conv auxiliary 
allows a conversion function of type Term -> a to be obtained 
from the second component of a Cons a value. The nonEmpty 
auxiliary is used to ensure that a partially-defined value is not 
generated when there is no fully-defined refinement of that value 
within the depth limit. 


cons :: a -> Series a 

cons a d = SumOfProd [ [] ] : *: [const a] 

empty :: Series a 

empty d = SumOfProd [] : *: [] 

(\/) :: Series a -> Series a -> Series a 
(a \/ b) d = SumOfProd (psa ++ psb) (ca ++ cb) 

where SumOfProd psa :*: ca = a d 
SumOfProd psb :*: cb = b d 

(><) :: Series (a -> b) -> Series a -> Series b 
(f >< a) d = 

SumOfProd [ta:p I notTooDeep, p <- ps] cs 
where SumOfProd ps cfs = f d 
ta cas = a (d-1) 
cs = [ \(x:xs) -> cf xs (conv cas x) 

I notTooDeep, cf <- cfs ] 
notTooDeep = d > 0 kk nonEmpty ta 

nonEmpty :: Type -> Bool 

nonEmpty (SumOfProd ps) = not (null ps) 

conv :: [[Term] -> a] -> Term -> a 
conv cs (Hole p _) = hole p 
conv cs (Ctr i xs) = (cs !! i) xs 


Figure 1. Lazy SmallCheck’s Series combinators. 

Refutation Algorithm 

The algorithm to refute a property takes two parameters, the prop¬ 
erty to refute and an input term, and behaves as follows. 

refute :: (Term -> Bool) -> Term -> 10 0 
refute p x = do 

ans <- answer (p x) 
case ans of 

Known True -> return () 

Known False -> putStrLn "Counter example found" 
» exitWith ExitSuccess 

Unknown pos -> mapM_ (refute p) (refine x pos) 

A simple variant of Lazy SmallCheck’s depthCheck function 
can be now be defined. 

check :: Serial a => Int -> (a -> Bool) -> 10 0 
check d p = refute (p . conv cs) (Hole [] t) 
where t : *: cs = series d 

For simplicity of presentation, these two definitions do not at¬ 
tempt to print counter examples, count the number of tests per¬ 
formed, or support checking of multi-argument properties. 

Parallel Conjunction 

Parallel conjunction is a straightforward extension to the refutation 
algorithm. The main difference is that answers contain values of 
type Property rather than Bool. Internally, a Property is just a 
representation of a logical formula. To evaluate a Property of the 
form p *&* q, p is evaluated first and if it is unknown, then q is 
also evaluated, without refining the input as demanded by p. If p 
or q evaluates to False then the value of the whole conjunction is 
taken to be False. If both p and q are unknown, then the input is 
refined at the position demanded by p. This means that the number 
of tests generated can decrease when switching from && to *&*, but 
never increase. There is however an evaluation overhead when p is 
unknown, because *&* will evaluate q in this case and kk will not. 



Variations 

Two alternative implementations of Lazy SmallCheck have been 
explored, both avoiding repeated conversion of universal terms to 
Haskell values of a particular type. One uses Data. Generics and 
only works in GHC, while the other requires an extra method in 
the Serial class so that refinement can be defined on a per-type 
basis. These variants are more efficient, but by no more than a 
factor of three in our experience. The implementation presented 
here has the advantage of giving depth and generation control to 
the programmer in a simple manner that is largely compatible with 
the core SmallCheck subset. 

5. Comparative Evaluation 

Previous sections have included some in-principle comparisons 
between the three libraries. This section presents some quantitative 
results. Table 3 shows the runtimes of several example properties 
tested to varying depths with SmallCheck and Lazy SmallCheck. 
QuickCheck is not represented in this table because it does not 
have the same notion of a depth bound. However, the time taken 
by QuickCheck to refute an invalid property can be meaningfully 
compared with that taken by SmallCheck and Lazy SmallCheck; 
such timings are noted in the discussion. 

All the example properties are first-order and universally- 
quantified. All test generators are written using the simple standard 
pattern, with no customisation. The following paragraphs discuss 
the results, focusing on some of the more interesting examples. 

RedBlack The RedBlack program is an implementation of sets 
using ordered Red-Black trees, taken from (Okasaki 1999). A fault 
was fabricated in the rebalancing function by swapping two sub¬ 
trees. This is where most of the complexity in the implementation 
lies and is a likely source of a programming mistake. Okasaki’s tree 
representation is as follows. 


Property 

RedBlack 

Turner 

SumPuz 

Huffman 

Countdowni 

Countdown 

CircuitS2 

CircuitS3 

Catch 

Mate 


3 4 

L 0.03 *0.15 

S 0.20 x 

L 0.01 0.47 

S 0.01 0.07 

L 0.05 3.68 

S 0.05 4.48 

LOO 
S 0 0.01 

L 0.01 0.14 

S 0.05 17.43 

L 0.01 1.23 

S 0.01 1.44 

L 0 0.01 

S 0 0.01 

L 0.06 13.28 

S 0.02 5.08 

L 0.07 6.22 

S 0.02 88.23 

L 0 0.37 

S 0.06 x 


ListSet 
Huffmani 
Circuits i 


7 8 

L 0.01 0.02 

S 0.05 0.39 

L 0.27 2.76 

S 0.08 0.73 

L 0.06 0.29 

S 0.04 0.20 


Depth 

5 6 7 


421.80 x 

682.86 x 

0.63 22.9 x 

7.65 x 

2.27 39.3 800.4 

666.95 x 

737.10 x 

0.01 0.03 0.06 

0.52 63.80 x 


830.02 x 

*29.87 

Depth 

9 10 11 

0.03 0.06 0.13 

4.06 694.10 x 

27.57 315.81 x 

7.69 90.38 x 

1.62 10.06 70.44 

1.21 8.38 65.88 


Key: 


Counter example found L Lazy SmallCheck 
Longer than 20 minutes S SmallCheck 


data Colour = R I B 

data Tree a = E I T Colour (Tree a) a (Tree a) 

A predicate defining ordered Red-Black trees was added, cap¬ 
turing three things: that trees are ordered (the ord invariant), that 
no red node has a red parent (the red invariant), and that every path 
from the root to an empty node contains the same number of black 
nodes (the black invariant). 

redBlack t = ord t && black t && red t 


Table 3. Times to check benchmark properties using SmallCheck 
and Lazy SmallCheck at various depths. 


from the root to any symbol describes the unique, variable-length 
sequence of zeros and ones representing that symbol. 

Two properties were added to Bird’s program. The first states 
that the decompresser (decode) is the inverse of the compressor 
(encode). 


The following property was also added. 

prop_insertRB :: Int -> Tree Int -> Bool 
prop_insertRB x t = 

redBlack t ==> redBlack (insert x t) 

No counter example was found within 20 minutes of testing 
at depth 4 using SmallCheck. QuickCheck, with simple random 
generation of trees, did not find a counter example after 100,000 
batches of 1000 tests (amounting to 32 minutes of testing). Testing 
with Lazy SmallCheck revealed the fault in a fraction of a second at 
depth 4, and with the fault removed, verified the property at depth 
4 within 7 seconds. 

The number of tests is a few times lower when using parallel 
conjunction inside the redBlack invariant. However, in this case 
the evaluation overhead of using *&* is substantial and cancels the 
benefit of fewer tests. 

Huffman The Huffman program is taken from (Bird 1998). It 
contains functions for both compression and decompression of 
strings, along with a function for building Huffman trees. A Huff¬ 
man tree is a binary tree with symbols at its leaves, and the path 


prop_decEnc cs = 

length ft > 1 ==> decode t (encode t cs) == cs 
where ft = collate cs 
t = mkHuff ft 

Here, collate builds a frequency table (ft) for an input string, 
and mkHuff builds a Huffman tree from a frequency table. 

The second property asserts that mkHuff produces optimal 
Huffman trees, that is, for all binary trees t, if i is a Huffman 
tree then it has a cost no less than that produced by mkHuff. A 
binary tree is only a Huffman tree (as determined by isHuff) if it 
contains every symbol in the source text exactly once. The cost of 
a Huffman tree is defined as the sum of each symbol’s frequency 
multiplied by its depth in the Huffman tree. 

prop_optimal cs t = 

isHuff t cs ==> cost ft t >= cost ft (mkHuff ft) 
where ft = collate cs 

In checking the first property, SmallCheck was more efficient 
than Lazy SmallCheck by a constant factor of 3. This property is 
hyper-strict for most inputs. On the second property, due to the 



condition that input trees must be Huffman trees. Lazy SmallCheck 
allowed testing to one level deeper within the 20 minute cut-off. 
Turner The Turner program is a compiler from lambda expres¬ 
sions to Turner’s combinators, as defined in (Peyton Jones 1987). 
In particular, it provides a function (abstr) to abstract a free vari¬ 
able from an expression by introducing combinators from a known, 
fixed set. The property of interest is Turner’s law of abstraction 
(Turner 1979), stating that if a variable is abstracted from an ex¬ 
pression, and the resulting expression is applied to that variable, 
then one ends up with the original expression again. 

prop_abstr v e = reduce (abstr v e V v) == e 

Here, : @ is function application and reduce applies combinator re¬ 
duction rules to a given expression. This property can only return 
True after demanding the whole input so, due to strictness, Small- 
Check has an advantage, this time by a factor of 7 at depth 4. 

Mate The Mate program solves mate-in-N chess problems. It 
represents a chess board as two lists, the first containing white’s 
piece-position pairs and second containing black’s. 

data Board = Board [(Kind,Square)] [(Kind,Square)] 
data Kind = King I Queen I Rook 

I Bishop | Knight I Pawn 

type Square = (Int.Int) 

data Colour = Black I White 

It includes a function checkmate returning whether or not a 
given colour is checkmated on a given board. A property was added 
stating that for all chess boards 6, if b is a valid board and white has 
only a king and a pawn, then black cannot be in checkmate. 

prop_checkmate b@(Board ws bs) = 

( length ws == 2 
kk Pawn ‘elem‘ map fst ws 
kk validBoard b 
) ==> not (checkmate Black b) 

A valid board is one satisfying a number of healthiness criteria, 
such as each side has exactly one king, kings cannot be placed on 
touching squares, and no two pieces can occur on the same square. 

Neither SmallCheck at depth 4 after 20 minutes, nor QuickCheck 
with a 100,000 batches of 1000 random tests after 18 minutes, re¬ 
vealed a counter example. Lazy SmallCheck within 30 seconds at 
depth 5 produces 

Counter example found: 

Board [(King,(3,2)),(Pawn,(2,1))] 

[(Queen,(1,3)),(King,(1,2)),(Bishop,(1,1))] 

The order of conjuncts in the property has a significant impact 
on performance, and a lot of experimentation was required to find 
the best order. The time taken to find a counterexample was more 
than 20 minutes if the order was unfortunately chosen. However, 
using parallel conjunction, no ordering required more than 22 sec¬ 
onds to find a counter example. 

Other Examples The remaining examples follow a similar pat¬ 
tern. SmallCheck is more effective on strict properties and Lazy 
SmallCheck wins for lazy ones. Of these examples, ListSet is the 
set implementation using ordered lists (along with the insertion 
property) given earlier, Countdown is a solver for a popular num¬ 
bers game (along with a lemma and a refinement theorem) taken 
from (Hutton 2002), SumPuz is a cryptarithmetic solver (with a 
soundness property) from (Claessen et al. 2002), Circuits is part 
of a library from the Reduceron (Naylor and Runciman 2008), and 
Catch is a specification (with a soundness property) for part of the 
Catch tool (Mitchell and Runciman 2007). 


Summary of Results In two of the thirteen example properties, 
Lazy SmallCheck found a counter example in good time, and 
SmallCheck and QuickCheck did not. In five. Lazy SmallCheck 
permitted deeper checking than SmallCheck, and in five others, 
SmallCheck had a constant factor advantage over Lazy Small¬ 
Check, ranging from a small 1.04 factor to a more significant 7. 

Five of the example properties have an implication where the 
condition is composed of several conjuncts, and could potentially 
be improved by using parallel conjunction. In two of these, parallel 
conjunction had no impact on the number of tests, but neither did it 
introduce a significant evaluation overhead. In another, the number 
of tests was reduced, but this was cancelled out by the evaluation 
overhead. And in another, parallel conjunction reduced the runtime 
by up to a factor of three for some conjunct orderings, but had 
no effect on others. In the remaining example, the use of parallel 
conjunction eliminated the need to put a long series of conjuncts in 
a particular order for a counter example to be found. 

6. Related work 

Needed Narrowing Lazy SmallCheck’s refutation algorithm is 
closely related to needed narrowing (Antoy et al. 1994), an eval¬ 
uation strategy used by some functional-logic languages, including 
Curry (Hanus), and some Haskell analysis tools (Lindblad 2008) 
(Naylor and Runciman 2007). Like Lazy SmallCheck, needed nar¬ 
rowing allows functions to be applied to partially-defined inputs, 
but this is achieved using logical variables rather than calls to error. 
As needed narrowing is designed for functional-logic programs, it 
also deals with non-deterministic functions. 

A typical implementation of needed narrowing stores the par¬ 
tially evaluated result after each test, and resumes the evaluation af¬ 
ter refining the input. Lazy SmallCheck instead evaluates the prop¬ 
erty from scratch every time an undefined part of the input is de¬ 
manded. This means that needed narrowing is more efficient. For 
small inputs, it would be interesting to explore just how big (or 
small) this benefit is. 

Residuation Parallel conjunction is related to residuation, an¬ 
other evaluation strategy used by some functional-logic languages 
including Curry (Hanus) and Escher (Lloyd 1999). Under residua¬ 
tion, if the value of a logical variable is demanded by some logical 
conjunct in the system, then that conjunct suspends on the variable, 
and another conjunct is evaluated. If evaluation of this second con¬ 
junct happens to instantiate the variable suspended on by the first, 
then the first conjunct is resumed. 

In parallel conjunction, when evaluation of the first conjunct 
calls error, the second conjunct is immediately evaluated on the 
same input. If the second conjunct also calls error then the input 
is refined. Therefore, a parallel conjunction of the form p *k* q 
is similar to evaluating p by residuation and q by narrowing. The 
end result in both cases is that if either conjunct is falsified, then 
so is the whole conjunction. There is no need for resumption and 
suspension mechanisms in Lazy SmallCheck because it evaluates 
the conjunction from scratch every time a refinement is made. 

Gast Gast (Koopman et al. 2002) is a library for property-based 
testing in Clean. It exploits Clean’s generic programming features 
to offer a default test-generator for all user-defined types. Like 
SmallCheck, it generates fully-defined and finite values. Unlike 
SmallCheck, it employs a blend of random and systematic genera¬ 
tion. Constructors of an algebraic data type are selected at random, 
and duplicate tests are avoided by keeping a record of which in¬ 
puts have been tried already. The authors also mention testing of 
existential properties, but without giving details. 



EasyCheck EasyCheck (Christiansen and Fischer 2008) is an¬ 
other testing library, written in Curry. Like Lazy SmallCheck it 
employs narrowing to achieve property driven generation of data. 
The library makes use of the data refinement and narrowing mecha¬ 
nisms built into Curry. It provides a number of combinators for ex¬ 
pressing properties about non-deterministic functions. Apart from 
this, the main difference is that EasyCheck uses level diagonalisa- 
tion, which has the advantage that it allows systematic generation of 
deep and shallow inputs in a fair order. There are also some disad¬ 
vantages of level diagonalisation: any counter examples produced 
are not necessarily minimal, and it is not clear to the programmer 
which inputs have been tested and which have not. 

7. Future Work and Conclusions 

Future Work It would be interesting to investigate higher-order 
properties and existential quantifiers in the context of Lazy Small- 
Check. It would also be interesting to compare Lazy SmallCheck 
with a full-strength narrowing implementation, such as the Munster 
Curry Compiler (Lux 2003). This would help establish whether it 
is worth adding narrowing to an existing Haskell compiler to aid 
property-based testing, or whether lazy evaluation and imprecise 
exceptions already provide most of the benefit. Another avenue for 
investigation would be the ability to import QuickCheck, Small¬ 
Check, and Lazy SmallCheck in a program and test the same prop¬ 
erties using any tool. 

Conclusions If a property is refuted by SmallCheck then a sim¬ 
plest counter example is reported, and such a counter example is 
usually the easiest to investigate. Alternatively, if a property is not 
refuted then a clearly-defined portion of the input space on which 
it holds is reported, and this knowledge is valuable in judging the 
effectiveness of testing. In each case the SmallCheck user learns 
something useful that the QuickCheck user would not. Further¬ 
more, the SmallCheck user can (1) write data generators easily us¬ 
ing a simple standard pattern; (2) view counter examples of higher- 
order properties; and (3) enjoy a richer specification language sup¬ 
porting (unique) existential quantification. 

Using Lazy SmallCheck, the programmer can specify condi¬ 
tional properties as simple logical implications and typically have a 
plentiful supply of condition-satisfying inputs generated automati¬ 
cally. This is thanks not just to Haskell’s lazy evaluation strategy, 
which can compute well-defined outputs for partially-defined in¬ 
puts, but also to parallel conjunction. Parallel conjunction reduces 
the need for programmers to tweak conjunct orderings in proper¬ 
ties in order to obtain the maximum benefit of Lazy SmallCheck. 
Of course, it is very difficult to say how often conditional properties 
occur in general, but they arose quite readily in our thirteen bench¬ 
mark properties, the majority of which were taken from existing 
programs described in the literature. In seven of the thirteen prop¬ 
erties, Lazy SmallCheck allowed deeper testing than SmallCheck, 
and in two of these, counter examples were revealed that were sim¬ 
ply infeasible to find using QuickCheck and SmallCheck, at least 
without writing a custom generator. 

Although SmallCheck and Lazy SmallCheck are sometimes 
more effective than QuickCheck, the reverse is also true. For ex¬ 
ample, as part of his ICFP’07 invited talk, Hughes tested an SMS 
message-packing program using QuickCheck. QuickCheck uncov¬ 
ered a bug when packing messages of multiple-of-eight length. 
Such large, strictly-demanded messages would be outside the reach 
of SmallCheck and Lazy SmallCheck. 

Put simply: SmallCheck, Lazy SmallCheck and QuickCheck 
are complementary approaches to property-based testing in Haskell. 


Availability 

SmallCheck and Lazy SmallCheck are freely available from http: 

//hackage.haskell.org/. 
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